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Abstract 

Modern  satellites  tag  their  images  with  geo-location 
information  using  GPS  and  star  tracking  systems. 
Depending  on  the  quality  of  the  geo-positioning 
equipment,  geo-location  errors  may  range  from  a  few 
meters  to  tens  of  meters  on  the  ground.  At  the  current  state 
of  art,  there  is  not  an  established  method  to  automatically 
correct  these  errors  limiting  the  large-scale  utilization  of 
the  satellite  imagery.  In  this  paper,  an  automatic  geo¬ 
location  correction  framework  that  corrects  multiple 
satellite  images  simultaneously  is  presented.  As  a  result  of 
the  proposed  correction  process,  all  the  images  are 
effectively  registered  to  the  same  absolute  geodetic 
coordinate  frame.  The  usability  and  the  quality  of  the 
correction  framework  are  shown  through  probabilistic  3- 
D  surface  model  reconstruction.  The  models  given  by 
original  satellite  geo-positioning  meta-data  and  the 
corrected  meta-data  are  compared  and  the  quality 
difference  is  measured  through  an  entropy-based  metric 
applied  onto  the  high  resolution  height  maps  given  by  the 
3-D  models.  Measuring  the  absolute  accuracy  of  the 
framework  is  harder  due  to  lack  of  publicly  available  high 
precision  ground  surveys;  however,  the  geo-location  of 
images  of  exemplar  satellites  from  different  parts  of  the 
globe  are  corrected  and  the  road  networks  given  by 
OpenStreetMap  are  projected  onto  the  images  using 
original  and  corrected  meta-data  to  show  the  improved 
quality  of  alignment. 

Introduction 

With  the  advancements  in  satellite  imaging  technology, 
there  is  an  abundance  of  high  resolution  and  high  quality 
imagery  collected  on  a  daily  basis  from  around  the  globe 
by  many  satellites.  It  is  highly  desirable  to  utilize  these 
image  resources  to  generate  continuously  updated  high 
resolution  digital  elevation  models  (DEMs)  for  mapping, 
mensuration,  change  detection  and  event  monitoring 
purposes.  Many  Geographic  Information  System  (GIS) 
applications  would  also  benefit  greatly  if  existing  GIS  data 
such  as  road  networks  or  building  footprints  could  be  used 
in  conjunction  with  a  daily  stream  of  satellite  imagery. 


However,  even  the  most  modern  satellite  geo-positioning 
equipment  results  in  varying  degrees  of  geo-location 
errors  on  the  ground,  see  Table  1 . 


Satellite 

90%  CE 

GeoEye-1 

2.5  meters 

WorldView  1 

7.6  meters 

WorldView2 

12.2  meters 

Quickbird 

23  meters 

Table  1:  Geo-location 

accuracy  of  well-known 

satellites  reported  as  90% 

Circular  Error  (CE)  on  the 

ground 

Figure  1:  Pointing  error  of  a 
satellite  can  be  well 
approximated  by  a  translation 
on  the  image  plane. 


These  errors  need  to  be  relatively  corrected  before  the 
images  from  different  satellites  can  be  used 
simultaneously  as  part  of  any  3-D  reconstruction 
algorithm  since  triangulation  requires  the  rays  back- 
projected  from  image  features  to  intersect.  Relative 
correction  can  be  achieved  by  the  registration  of  the 
images  to  a  common  geodetic  coordinate  frame  which 
would  then  have  a  certain  absolute  accuracy.  Absolute 
accuracy  is  critical  for  any  application  that  requires  GIS 
data,  for  example  for  road  networks  to  align  well  with 
roads  in  images. 


a  b 

Figure  2:  a)  Projection  of  comers  of  a  building  with  known 
geographic  coordinates  and  elevation  using  the  original  RPC 
camera  of  the  image,  b)  Projections  of  the  same  corners  using  the 
corrected  RPC  model. 


Satellite  image  vendors  have  adopted  the  Rational 
Polynomial  Coefficient  (RPC)  model  to  represent  both  the 
internal  and  external  orientation  of  a  satellite  image  in  one 
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set  of  equations  [1].  The  RPC  model  provides  a  function 
that  maps  Earth  coordinates  given  as  (latitude,  longitude, 
elevation)  to  image  pixels  (u,  v).  The  function  can  be 
simplified  as  u  =  FRPC(\at,  Ion,  elev)us  +  u0  and  v  = 
FflPcQat#  l°n,  elev)vs  +  v0  for  some  scaling  parameters 
us,  vs,  some  offset  parameters  u0,  v0  and  for  a  degree  three 
rational  polynomial  function  FRPC  with  80  coefficients. 
Since  the  satellite  camera  is  far  (typically  -500  km)  from 
the  Earth’s  surface,  the  rays  for  individual  pixels  are 
almost  parallel  to  each  other,  as  illustrated  in  Figure  1. 
Thus,  geo-positioning  errors  can  be  corrected  by  small 
translations  in  the  image  plane.  This  type  of  correction  is 
termed  bias  correction  in  [2]  and  is  shown  to  accurately 
model  the  errors  for  images  that  are  less  than  50  km  in 
size.  Mathematically,  the  problem  is  to  compute  a 
correction  offset,  (Au0,Av0),  such  that 
u  =  FRPC(\at,  Ion,  elev)us  +  u0  +  Au0  and  v  = 
FflpcQat,  Ion,  elev)vs  +  v0  +  Av0,  see  Figure  1  and  Figure 
2.  Note  that  the  correction  offset  geo-registers  the  entire 
satellite  image  over  a  field  of  view  of  typically  40x40  km. 
Depending  on  the  geo-positioning  error  on  the  ground  and 
the  resolution  of  the  image,  the  correction  offsets  are  on 
the  order  of  5-10  pixels  on  the  image  plane.  For  the 
images  collected  by  satellites  in  Table  1,  the  worst  case 
correction  offsets  would  range  from  5  pixels  in  radius,  for 
GeoEye-1  imagery  with  -0.5  meter  GSD,  up  to  30  pixels 
in  radius  for  Quickbird  imagery  with  -  1  meter  GSD. 


Figure  3:  The  process  flow  diagram  of  the  proposed  RPC  offset 
correction  framework. 

In  this  paper,  a  bias  correction  framework  is  presented 
that  inputs  a  set  of  multi-view  satellite  images  with 
varying  geo-positioning  errors  and  registers  them  to  a 
common  absolute  geodetic  coordinate  frame.  First,  the 
images  are  grouped  according  to  degree  of  overlapping 
views  and  a  correction  offset,  (Au0,Av0),  is  computed  for 
each  image  in  each  group  using  image-to-image 
correspondences  and  a  bundle  adjustment  type  correction 
algorithm,  described  in  Section  1.4.2.  Second,  the  offsets 
are  refined  utilizing  a  3-D  edge  modeling  framework  [3] 
which  provides  denser  feature  registration  by 
reconstructing  contours  of  the  buildings,  roads  and  other 
structures  that  are  visible  in  all  the  images  and  contribute 
to  the  3-D  model.  Figure  3  shows  the  process  flow 
diagram  for  the  proposed  RPC  offset  correction 
framework.  Specifically,  the  contribution  of  the  paper  is  a 


system  for  automatically  correcting  hundreds  of  satellite 
images  (each  with  ~40Kx40K  pixels)  with  varying  geo¬ 
location  errors  and  bring  the  overall  geo-location  accuracy 
to  the  accuracy  of  the  best  satellite.  Section  2.3  presents  a 
novel  relative  correction  algorithm  which  is  used  to  turn 
semi-supervised  3-d  edge  based  geo-correction  algorithm 
of  [3]  into  a  completely  unsupervised  algorithm.  Section 
2.3  explains  how  the  unsupervised  3-d  edge  based  geo¬ 
correction  algorithm  is  used  in  a  consensus  building 
framework  to  minimize  the  risk  of  computing  a  wrong 
offset  for  a  given  image. 

1.1.  3-D  modeling  using  satellite  imagery 

In  this  paper,  a  probabilistic  3-D  modeling  application 
using  multi-view  satellite  imagery  [3]  is  chosen  as  a  use- 
case  for  the  proposed  offset  correction  algorithm.  This 
algorithm  represents  the  3-D  volume  of  a  scene  using  a 
regular  grid  of  cubic  volume  elements,  voxels ,  e.g.  of  size 
1  m3,  and  computes  a  surface  occlusion  probability  and  a 
surface  appearance  model  for  each  voxel  in  the  volume. 
The  rays  from  the  available  images  are  cast  into  the 
volume  using  the  images’  camera  models  and  the  surface 
existence  probabilities  are  updated  simultaneously  with 
their  appearance  models  using  the  appearance  of  the  rays. 
It  is  shown  that  the  algorithm  converges  to  the  correct  3-D 
surface  model  as  more  images  are  used  to  update  the 
model  [3].  The  critical  assumption  is  that  the  geo¬ 
positioning  errors  of  the  input  images  are  relatively 
corrected  such  that  the  triangulation  errors  can  be 
absorbed  within  a  voxel.  For  example,  for  voxels  of  1  m3, 
a  circular  error  (CE)  of  1  meter  on  the  ground  needs  to  be 
achieved  for  all  the  input  images.  If  the  images  are  not 
adequately  corrected  then  the  rays  of  image  pixels  do  not 
intersect  at  the  correct  voxels  and  the  3-D  surfaces  can  no 
longer  be  accurately  recovered.  As  the  relative  errors  get 
more  severe,  the  3-D  model  surface  heights  (Z  coordinate) 
become  noisier.  In  this  paper,  the  noise  level  in  the 
orthographic  height  maps  given  by  these  volumetric  3-D 
models  is  measured  to  show  the  quality  of  the  proposed 
relative  registration  framework. 

When  the  relative  geo-positioning  accuracy  of  the  input 
images  is  high,  the  3-D  surface  geometry  is  accurately 
recovered  in  the  common  geographic  coordinate  frame 
induced  by  the  input  images.  Any  absolute  error  in  the 
common  coordinate  frame  of  the  input  images  is  inherited 
by  the  geo-position  of  the  resulting  3-D  surface  model.  In 
other  words,  the  recovered  surface  geometry  is  relatively 
correct  but  may  require  a  global  translation  to  remove  any 
absolute  geo-positioning  error.  In  this  paper,  the 
improvement  in  the  absolute  accuracy  of  the  input  RPC 
models  is  demonstrated  through  the  projections  of 
OpenStreetMap  roads  and  buildings  onto  images  using  the 
original  and  corrected  RPC  models  of  the  images. 
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1.2.  Related  Work 

Using  the  bias  error  model  and  a  bundle  adjustment 
algorithm,  only  a  few  ground  control  points  are  shown  to 
reduce  the  geo-positioning  errors  down  to  sub-meter 
accuracy  [4].  The  critical  factor  is  the  precision  of  the  3-D 
survey  positions  and  the  2-d  image  measurements,  termed 
as  tie  points ,  of  the  ground  control  points.  In  [2],  it  is 
shown  that  only  2-d  measurements  are  sufficient  to  correct 
the  bias  error  using  a  block  adjustment  algorithm,  given 
enough  tie  points.  The  challenge  for  automatic  RPC 
correction  is  then  the  computation  of  these  tie  points  as 
correct  and  well  localized  image-to -image 

correspondences.  There  is  previous  work  [5,6]  using 
sparse  features  such  as  SIFT  computed  from  in-track 
stereo  satellite  imagery  to  automatically  create  many  tie 
points  and  then  remove  the  outliers  via  a  RANSAC 
procedure.  However,  in  this  paper  the  input  satellite 
imagery  is  assumed  to  come  from  different  satellites  and 
collected  at  different  times,  months  or  years  apart. 
Computing  tie  points  from  such  imagery  automatically  is 
more  challenging  due  to  illumination  changes,  seasonal 
and  atmospheric  appearance  differences,  occlusions  and 
parallax  effects  of  tall  buildings  when  projected  with  very 
different  viewpoints.  The  algorithm  proposed  in  this  paper 
(Section  2.1)  successfully  computes  a  rich  set  of  tie  points 
for  groups  of  images  with  significant  collection  time 
differences  and  coming  from  different  satellites,  without  a 
priori  ground  control  points. 

A  major  approach  is  to  compute  tie  points  between 
satellite  images  and  DEMs  [5]  or  renderings  from  DEMs 
[7],  and  use  these  tie  points  to  correct  the  bias  error. 
However,  this  line  of  work  inherits  the  absolute  and 
relative  inaccuracy  of  DEMs  as  well  as  the  errors 
introduced  by  resolution  and  surface  geometry  mismatches 
between  the  satellite  images  and  the  terrain-only  DEMs.  In 
this  paper,  tie  points  are  computed  only  between  images 
and  thus  any  inaccuracy  due  to  DEMs  is  avoided1. 

Geo-location  correction 

The  proposed  RPC  camera  offset  correction  framework 
is  comprised  of  three  major  algorithmic  components:  1) 
sparse  image-to -image  correspondence  computation  for  a 
group  of  images;  2)  offset  correction  of  a  group  of  images 
using  a  set  of  correspondences  and  3)  offset 
refmement/computation  of  an  image  using  a  3-D  edge 
model.  The  process  flow  diagram  of  the  proposed 
framework  using  these  algorithmic  components  is  shown 
in  Figure  3  and  each  component  will  be  explained  in  the 
following  sections.  The  first  two  components 
automatically  generate  tie  points  and  reduce  the  errors 

'Note  that  in  this  paper,  ASTER  DEM  tiles  (30  meter  resolution)  are  also 
used  albeit  only  to  set  reasonable  estimates  for  minimum  and  maximum 
elevations  in  a  given  area  and  to  provide  a  rough  ground  plane  to 
constrain  the  search  for  registration. 


down  to  positioning  accuracy  of  the  correct  tie  points,  then 
the  edge  modeling  framework  refine  the  offsets  and  bring 
the  errors  down  to  the  level  of  edge  model  resolution. 
Note  that  edge  modeling  is  also  capable  of  correcting  bias 
when  the  original  cameras  of  the  images  are  passed  to  it 
given  that  the  edge  model  has  been  updated  with  5-10 
corrected  images.  This  correction  capability  is  used  for 
images  that  do  not  form  enough  tie  points  with  other 
images  to  support  bundle  adjustment.  However  the  edge 
alignment  search  radius  is  increased  to  insure  that  the 
global  minimum  is  obtained. 


a  b 

Figure  4:  a)  A  base  image  patch  with  detected  Harris  comers 
(red  points),  b)  Alignment  of  patches  from  other  images  in  the 
group  on  the  base  patch  around  a  selected  correspondence  point 
(marked  with  yellow  plus  sign).  The  base  patch  is  shown  in  the 
red  channel  and  the  aligned  patches  are  shown  in  the  green  and 
blue  (cyan)  channels. 

1.3.  Image-to-image  correspondence  computation 

Given  a  group  of  satellite  images,  the  area  of  overlap 
(intersection)  between  the  input  images  is  computed  using 
the  footprints  of  the  images  given  in  the  meta-data,  see 
Figure  5a.  One  of  the  images  in  the  group  is  selected  as 
the  base  image  and  the  intersection  area  in  this  image  is 
regularly  cropped  into  smaller  image  patches 
corresponding  to  sizes  of  roughly  250  meter  by  250  meter 
on  the  ground,  Figure  4a.  The  Harris  corner  detection 
algorithm  is  then  run  on  each  base  patch  to  generate  a 
sparse  set  of  potential  correspondences  from  each  patch, 
Figure  4a.  Given  a  base  image  patch,  it  is  possible  to 
roughly  align  patches  from  the  other  images  that 
geographically  correspond  to  this  same  area  using  the  RPC 
models  of  the  images  and  a  digital  elevation  model  (DEM) 
as  a  rough  ground  plane.  Specifically,  the  corresponding 
image  patch  from  the  second  image  is  first  ortho-rectified 
[1]  and  then  projected  back  onto  the  base  patch  using  the 
base  image’s  RPC  model.  Depending  on  the  geo¬ 
positioning  errors  of  the  RPC  models  of  the  images  and 
the  DEM  ground  plane  accuracy,  the  alignment  error  can 
be  10s  of  pixels;  however,  the  RPC  model  provides  an 
adequate  initial  alignment.  This  initial  alignment  is 
corrected  using  an  enhanced  phase  correlation  (EPC) 
algorithm  [8]  around  each  corner  point  and  at  multiple 
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scales  across  all  images  to  account  for  resolution 
differences.  The  corners  that  generate  correlation  peaks 
and  satisfy  a  threshold  requirement  are  returned  as  the 
image-to-image  correspondences  for  the  patch.  Figure  4b 
shows  the  final  alignment  of  the  patches  from  four  other 
images  in  the  group  relative  to  the  base  patch  around  a 
selected  correspondence  point.  Observe  that  the 
correspondence  point  is  well  localized  in  all  images  at  the 
correct  corner  of  the  corresponding  building  structure. 

EPC  successfully  avoids  erroneous  correspondences 
from  cloudy  regions  and  is  robust  to  the  illumination 
differences  and  parallax  effects  among  the  images.  When 
the  computed  correspondences  are  semantically  correct, 
i.e.  when  they  come  from  a  common  3-D  feature  point,  the 
localization  errors  are  observed  to  be  within  two  pixels. 
The  two  pixel  error  is  not  sufficiently  accurate  for  3-D 
reconstruction  and  must  be  further  refined  (Section  2.3); 
however,  it  is  good  enough  to  be  used  as  part  of  a  bundle 
adjustment  type  bias  correction  algorithm  as  described  in 
the  next  section  (Section  2.2).  It  is  observed  that  the 
proposed  algorithm  can  generate  wrong  correspondences 
due  to  moving  objects,  vegetation  or  changes  in  the 
structure  of  the  objects  in  the  scene,  e.g.  construction  sites. 


a  b  c 

Figure  5:  a)  Footprints  of  6  GeoEye-1  images  selected  as  a  seed 
group,  b)  Footprints  of  the  images  in  the  second  group  to  be 
corrected  using  2  seeds,  c)  Footprints  of  all  the  images  that  are 
corrected 

1.4.  RPC  offset  correction 

Given  a  group  of  satellite  images,  and  a  set  of  image-to- 
image  correspondences,  a  random  sample  consensus 
(RANSAC)  procedure  is  used  to  first  eliminate  wrong 
correspondences.  The  initial  algorithm  in  this  RANSAC 
procedure  computes  a  correction  offset  for  each  image  in 
the  group  using  one  image-to-image  correspondence  only, 
Section  1.4.1.  The  distances  between  the  correction  offsets 
given  by  different  correspondences  are  computed  in  the 
image  domain  and  the  correspondences  that  yield  less  than 
2  pixel  difference  are  selected  as  inliers.  The  inlier 
correspondences  participate  in  the  second  algorithm, 
Section  1.4.2,  which  refines  the  offsets  through  bundle 
adjustment  using  all  the  correspondences  at  the  same  time. 

1.4.1  Offset  correction  using  one  image-to-image 
correspondence 

Using  the  RPC  model  of  a  satellite  image,  it  is  possible 


to  back-project  a  ray  from  any  image  point  (u,v)  onto  a 
horizontal  plane  with  a  given  elevation,  Z,  and  compute  a 
3-D  point  (X,  Y,  Z)  as  the  intersection  point  of  the  ray  with 
the  plane  [9].  In  the  proposed  correction  scheme,  given  a 
group  of  images,  and  image-to-image  correspondences,  a 
ray  from  each  image-to-image  correspondence  point  is 
cast  onto  a  series  of  horizontal  planes  with  elevations  in 
the  range  [Zmin,  Zmax]  and  with  AZ  increments2,  see 
Figure  6.  For  each  plane,  z  =  Zi;  a  set  of  points  on  the 
plane  {(Xj.Y/)}  are  generated  for  each  image  j.  A 
weighted  mean  (Xi;  Yj),  and  a  weighted  scatter  value,  oi?  is 
computed  as  (Xt,  %)  =  (£j  Wj  Xj,  £j  Wj  Y* )  and 

C7i=  Wj  (xj  -  XO2  +  ^  Wj  (Yj  -  YO2 

where  the  weights  for  each  image  are  given  such  that 
£jWj  =  1.  The  purpose  of  weighting  is  to  reflect  the 
relative  reliability  of  the  RPC  camera  model  of  an  image 
and  is  explained  in  Section  1.4.3.  The  mean  position  is  the 
estimate  of  the  3-D  intersection  point  at  the  corresponding 
plane  elevation  and  the  scatter  value  measures  the 
accuracy  of  ray  intersection  for  this  3-D  point.  Observe 
from  Figure  6  that  the  scatter  value  is  minimized  around 
the  correct  elevation  of  the  3-D  intersection  point  induced 
by  this  image-to-image  correspondence.  The  elevation 
with  the  minimum  scatter,  Z  in  Figure  6,  is  chosen  as  the 
elevation  of  the  3-D  intersection  point  of  the  image-to- 
image  correspondences.  The  3-D  intersection  point  of  this 
best  z  value,  (X,Y,  2),  can  then  be  projected  back  to  the 
images  using  the  RPC  models  to  get  (Uj,Vj)  for  image  j 
and  the  correction  offsets  are  computed  as  (Au0^Av00  = 
(Uj  —  Uj,  Vj  —  Vj)  for  each  image.  Also  observe  from 
Figure  6  that  by  using  the  known  ray  geometry  in  3-D, 
parallax  effects  are  automatically  taken  into  account. 


Figure  6:  Offset  correction  using  an  optimal  estimate  for  a 
common  3-D  intersection  point  given  a  group  of  satellite  images 
and  one  image-to-image  correspondence  as  explained  in  Section 
1.4.1. 

The  proposed  algorithm  ensures  that  the  images  in  the 

2  A  viable  range  for  Z  values,  [Zmin,Zmax\,  is  retrieved  from  the  image 
meta-data. 


310 


group  have  zero  relative  pointing  error  as  their  rays  are 
moved  to  intersect  perfectly  at  the  correspondence  point. 
However,  the  absolute  accuracy  of  the  correction  depends 
on  the  absolute  accuracy  of  the  recovered  3-D  intersection 
point.  Observe  from  Figure  1  that,  assuming  the  pointing 
errors  of  the  rays  are  unbiased,  their  projected  distribution 
on  a  plane  can  be  well  approximated  by  a  normal 
distribution  with  zero  mean.  If  it  is  assumed  that  on  the 
correct  elevation  plane,  the  {(xj,Y^)}  values  are 
distributed  normally  with  zero  mean  around  the  correct 
absolute  position  (X,Y),  then  as  j  ->  oo?  the  mean  value 
(X^Yi)  will  approach  (X,  Y).  Thus,  the  proposed  algorithm 
approaches  better  absolute  accuracy  as  the  number  of 
images  in  the  group  is  increased.  In  practice,  groups  of  as 
few  as  5  images  are  found  to  achieve  adequate  absolute 
accuracy  for  the  GeoEye-1  imagery  used  in  the 
experiments  for  this  paper. 


a 
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Figure  7:  Projections  of  the  rooftop  (shown  in  yellow  lines)  of  a 
building  using  a)  original  RPC  cameras  b)  corrected  RPC 
cameras  of  a  group  of  4  images. 

1.4.2  Offset  correction  using  multiple  image-to -image 
correspondences 

Given  a  group  of  images  and  a  set  of  inlier 
correspondences  which  generate  consistent  global  offsets 
for  the  images,  the  offsets  are  refined  using  a  bundle 
adjustment  framework  where  the  3-D  intersection  points 
are  refined  simultaneously  with  the  camera  correction 
offsets.  Mathematically,  the  following  reprojection  error  is 
minimized  using  the  Levenberg-Marquardt  optimization 
algorithm: 

M 

^RPcO^k'  Yk,  Zk;  AuQj),  VRPC(Xk,  Yk,  Zk;  Av0^)^ 

k=i  j 

-  (Uj,Vj)k] 

where  (up  Vj)k  is  the  jth  image  coordinate  of  the  kth  image- 
to-image  correspondence,  (Xk,Yk;Zk)  is  the  current  3-D 
intersection  point  of  the  kth  correspondence  and  the  current 
offset  of  jth  image  (Auo^Avo0  is  fixed  across  the 
correspondences.  The  3-D  intersection  points  are 
initialized  using  the  algorithm  described  in  Section  1.4.1 
for  each  image-to-image  correspondence  and  the  camera 
offsets  (Au0^  Avo0  are  initialized  with  the  values  given  by 


one  of  the  inlier  correspondences.  Figure  7  shows  a  group 
of  4  GeoEye-1  images  and  the  projections  of  the  corners 
of  a  building  with  known  height  onto  each  image  with  the 
original  and  the  corrected  RPC  models. 


a  be 

Figure  8:  a)  The  400  m  x  400  m  scene  volumes  initialized  at  the 
intersection  area  of  the  seed  images  shown  in  Figure  5  a.  b)  An 
example  edge  image  used  in  updating  the  3-D  edge  model,  c)  An 
orthographic  projection  of  the  3-D  edge  features  of  a  model 
updated  with  40  edge  images  with  corrected  RPC  models.  Red 
lines  show  the  projections  of  OpenStreetMap  roads  onto  the 
orthographic  expected  edge  image  to  demonstrate  the  absolute 
accuracy  of  the  input  corrected  RPC  cameras. 

1.4.3  Grouping  images  and  selection  of  seed  group 

Given  a  set  of  images  to  correct  with  varying  geo¬ 
positioning  errors  and  resolution,  it  is  difficult  to  find 
mutual  correspondences  among  all  the  images.  The  reason 
is  that  the  overlap  area  gets  smaller  as  more  images  are 
used,  and  varying  resolution  and  cloud  coverage  across 
images  makes  it  difficult  to  find  good  correspondences.  In 
practice,  it  is  easier  to  generate  groups  of  3-10  images 
with  a  good  correspondence  set.  One  strategy  is  to  group 
all  the  images  into  groups  with  3-10  images  and  run  the 
group  correction  algorithm  independently  on  each  group; 
however,  the  global  accuracy  of  each  group  depends  on 
the  geo-positioning  errors  of  the  members  of  each  group 
and  may  vary  from  group  to  group  resulting  in  relative 
geo-location  inconsistency  among  the  groups.  A  better 
strategy  is  to  generate  a  seed  group  from  images  with  best 
resolution  and  smallest  geo-positioning  errors  to  establish 
a  coordinate  frame  with  best  possible  absolute  accuracy, 
see  Figure  5a.  This  seed  group  is  corrected  first  using  the 
group  correction  algorithm  (Sections  1.4.1  and  1.4.2) 
where  the  weights  of  all  images  in  the  group  are  set  to  be 
equal.  Then  a  second  group  of  3-10  images  is  formed 
where  2  images  come  from  the  seed  group,  Figure  5b.  The 
weights  of  the  seed  images  in  this  group  are  set  to  0.5  and 
the  weights  of  the  other  images  are  set  to  0.  With  this 
weight  setting,  3-D  intersection  points  during  group 
correction  procedure  are  entirely  determined  by  the  two 
seed  images  in  the  group,  effectively  registering  the  other 
images  in  this  group  to  the  seed’s  absolute  coordinate 
frame.  Then  a  third  group  is  formed  and  so  on.  All  the  rest 
of  the  images  can  be  grouped  in  this  manner  such  that  two 
images  are  already  corrected  and  then  sequentially 
corrected  group  by  group.  Figure  5c  shows  footprints  of 
100  images  that  are  corrected  starting  from  the  initial  6 


311 


seed  images.  Observe  that  the  coverage  area  of  images 
with  geo-locations  corrected  to  a  common  absolute 
coordinate  frame  increase  beyond  the  initial  seed  coverage 
and  new  images  that  intersect  with  this  larger  area  can 
now  be  corrected  using  the  same  group-and-correct 
algorithm.  Geographically  distant  areas  can  be  corrected 
using  different  seed  groups  and  growing  the  geo-located 
images  around  them  if  enough  images  to  connect  to  them 
cannot  be  found. 


a  b 

Figure  9:  Footprints  of  all  the  satellite  images  used  in  the 
experiments  in  this  paper  collected  over  a)  India  and  b)  Jordan 
regions  by  GeoEyel,  Worldviewl,  Worldview2,  Quickbird 
satellites  over  a  period  of  5  years. 

1.5.  Offset  refinement/correction  via  3-D  edge 
modeling 

The  absolute  accuracy  of  the  corrected  geo-locations  of 
imagery  using  the  framework  of  Section  2.2  depends  on 
the  quality  of  the  image-to-image  correspondences  given 
by  the  algorithm  described  in  Section  2.1.  The  wrong 
correspondences  are  discarded  with  the  RANSAC 
framework;  however,  the  inlier  correspondences  may  still 
have  localization  errors  up  to  2  pixels.  In  practice,  a 
second  round  of  refinement  using  a  3-D  edge  model  [10] 
is  found  to  be  necessary  to  ensure  that  all  the  images  are 
relatively  corrected  to  a  level  determined  by  the  voxel 
resolution  of  the  model. 

The  3-D  edge  models  are  volumetric  models  as  in  [3] 
and  are  updated  with  edges  computed  from  the  input 
satellite  images  instead  of  image  intensity,  Figure  8b.  The 
details  of  the  update  equations  are  given  in  [10]  but 
essentially  every  voxel  in  the  3-D  scene  volume  stores  an 
edge  existence  probability  which  converges  to  1  as  more 
image  rays  contribute  with  an  edge  ray  piercing  that  voxel. 
In  [10],  a  semi-supervised  bias  correction  algorithm  is 
presented  where  the  3-D  edge  model  is  initialized  with  a 
set  of  seed  images  which  are  manually  corrected.  Then  the 
rest  of  the  images  are  aligned  by  correlation  to  the 
projections  of  the  3-D  edge  features  in  the  current  model 
onto  the  input  image.  In  this  paper,  the  same  edge 
correlation  based  correction  algorithm  is  used  but  the  bias 
correction  of  the  seed  images  is  automated  using  the 
algorithms  described  in  Sections  1.4.1  and  1.4.2.  The  final 
absolute  accuracy  is  typically  better  than  the  seed  accuracy 
due  to  the  averaging  effect  of  the  edge  accumulation  in 
each  voxel.  Figure  8c  shows  orthographic  projections  of  3- 


D  edge  features  recovered  using  this  correct-and-update 
algorithm.  Observe  that  most  of  the  building  roof  contours 
are  recovered  along  with  some  road  features. 
OpenStreetMap  roads  (in  red)  are  projected  onto  this  edge 
image  to  show  the  final  absolute  accuracy  of  the  model. 


Average  entropy:  0.92  0.24 


Average  entropy:  0.76  0.52 


Figure  10:  Orthographic  height  maps  given  by  3-D  models  after 
40  image  updates.  The  first  column  shows  an  example  (cropped) 
input  image  for  the  scene,  the  second  column  shows  a  height 
map  given  by  a  model  using  original  RPC  cameras  of  the  input 
images  and  the  last  column  shows  when  the  input  cameras  are 
corrected  by  the  proposed  framework.  Observe  from  the  last 
column  that  the  streets,  building  shapes  and  the  tree  heights  are 
recovered  better  with  the  corrected  cameras. 

The  3-D  edge  model  volumes  are  initialized  to  have  1 
m3  voxel  resolution  and  400x400  m2  local  horizontal 
span  to  minimize  Earth’s  curvature  effects  during 
reconstruction,  Figure  8a.  The  heights  of  the  3-D  edge 
model  volumes  are  initialized  using  ASTER  DEM  tiles  of 
the  area  with  80  meter  margin  added  on  top  of  the  highest 
terrain  elevation  to  account  for  buildings.  The  quality  of 
the  edge-based  corrections  vary  depending  on  the  presence 
of  surfaces  in  the  scene  that  can  generate  a  set  of  edges. 
However,  note  that  the  satellite  images  have  swath  width 
of  -50  km  and  one  image  intersects  many  much  smaller  3- 
D  volumes,  Figure  8a.  In  this  paper,  a  RANSAC 
procedure  is  proposed  to  correct  the  bias  errors  of  a  given 
image  using  all  the  3-D  edge  models  that  intersect  it.  In 
order  to  correct  an  image,  a  consensus  among  the  models 
is  sought  and  the  correction  offset  of  the  inlier  model  with 
the  largest  consensus  is  selected.  The  3-D  edge  models  are 
initialized  on  the  intersection  area  of  the  seed  images 
given  by  Section  1.4.3,  Figure  8a,  and  these  seed  images 
with  corrected  cameras  are  used  to  update  the  model. 
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Given  a  non-seed  image,  if  it  was  already  corrected  by  the 
algorithm  in  Section  1.4.3,  then  the  search  space  of  edge 
image  correlation  is  significantly  reduced  and  its  offset  is 
refined  by  the  consensus  amount  given  by  all  the  scenes. 
In  practice  correction  offsets  are  only  refined  by  a  few 
pixels.  Given  a  non-seed  image  which  was  not  corrected 
by  the  algorithm  in  Section  1.4.3  (e.g.  due  to  lack  of 
enough  tie  points  to  any  of  the  groups),  then  the  search 
space  of  edge  image  correlation  is  set  according  to  the 
geo-positioning  error  of  the  satellite  and  its  correction 
offset  is  similarly  set  by  the  consensus  amount  given  by  all 
the  scenes.  In  both  situations,  the  final  absolute  accuracy 
of  a  corrected  image  is  the  same  as  the  accuracy  of  the 
corrected  seed  imagery.  Note  that  this  final  stage  is 
capable  of  geo-correcting  images  using  only  a  handful  of 
automatically  corrected  seed  imagery.  However,  grouping 
and  relatively  correcting  imagery  as  described  in  Section 
2.2.3  is  beneficial  as  it  reduces  the  search  space  of  the  3-d 
edge  based  correction. 


Figure  11:  A  plot  showing  average  entropy  values  (y  axis)  for 
20  randomly  selected  scenes  in  a)  India  and  b)  Jordan  regions. 

Experiments  and  Results 

For  the  experiments  in  this  paper,  two  geographic  areas 
in  Jordan  and  India  are  selected  with  roughly  200  satellite 
resources  in  each  region,  Figure  9,  with  images  from 
GeoEyel,  Worldview  1,  Worldview2  and  Quickbird 
satellites. 

Both  regions  have  high  to  medium  urban  coverage  that 
enables  computation  of  a  rich  set  of  image-to-image 
correspondences  and  edge  features  for  refinement.  All  the 
images  are  run  through  the  proposed  bias  correction 
system  of  Section  2  where  the  seed  groups  are  selected 
among  GeoEye-1  imagery  for  best  possible  absolute 
accuracy,  see  Table  1.  The  system  computes  an  offset 
successfully  for  67%  of  the  images.  The  failure  cases  are 
due  to  lack  of  sufficient  tie  points  and/or  lack  of  consensus 
during  3-D  edge  feature  based  refmement/correction  and 
caused  by  clouds  or  haze  in  the  images.  The  uncorrected 
images  are  discarded  by  the  system.  The  relative  accuracy 
of  the  corrected  images  are  verified  by  manually  labeling 
one  selected  building  (as  in  Figure  7)  and  the  projection 
errors  are  found  to  be  less  than  1  pixel  for  each  image. 

For  further  evaluation  of  the  bias  corrections,  the  3-D 
scene  modeling  framework  of  [3,10]  is  used  with 
400x400x100  m3  scene  volumes  of  1  m3  voxel  resolution. 


The  reduction  in  the  noise  level  of  the  output  models  are 
measured  when  the  models  are  built  by  the  original  RPC 
cameras  vs.  the  corrected  RPC  cameras  of  the  same  input 
images.  A  volumetric  3-D  model  usually  needs  around  40 
image  updates  to  converge  and  so  the  scenes  are  updated 
with  at  least  40  images. 


c 


Figure  12:  Projections  of  OpenStreetMap  roads  onto  a)  a 
Quickbird  image  over  India  region  b)  two  Worldview2  images 
over  Jordan  region  using  their  bias  corrected  cameras,  c) 
Projections  of  building  footprints  given  by  OpenStreetMap  over 
India  region  (the  original  projection  is  shown  in  green  and  the 
projection  with  corrected  cameras  is  shown  in  red). 

The  resulting  orthographic  height  maps  of  the  recovered 
3-D  surfaces  are  dramatically  different  depending  on  the 
initial  errors  of  the  input  images.  Figure  10  shows  two 
example  scenes  from  both  regions  and  their  height  maps 
with  original  and  corrected  cameras.  The  noise  level  in 


313 


these  height  maps  are  measured  as  the  average  entropy  of 
2  meter  by  2  meter  patches  in  the  height  map.  The  entropy 
value  depends  on  the  observed  structure  in  the  patch,  for 
example  flat  areas  have  entropy  close  to  zero,  while 
vegetation  patches  give  values  around  2.  Completely 
random  patches  have  entropy  value  8  with  256  possible 
height  values  (the  absolute  height  maps  are  shifted  to 
0,255  range  such  that  each  increment  is  1  meter).  The 
actual  value  of  the  average  entropy  of  a  height  map  varies 
depending  on  the  scene  content;  however,  it  is  expected 
that  the  average  entropy  will  be  reduced  significantly 
when  the  models  converge  to  less  noisy  surfaces  with  bias 
corrected  cameras.  Observe  from  Figure  10  that  this  result 
is  indeed  the  case  for  both  examples.  Figure  1 1  shows  a 
plot  of  the  average  entropy  values  for  20  randomly 
selected  scenes  in  the  areas  covered  by  the  corrected 
satellite  resources  of  Figure  9,  significant  reductions  in 
entropy  are  observed  for  all  of  them. 

The  entropy  measure  demonstrates  the  relative  accuracy 
of  the  bias  corrected  cameras.  The  absolute  accuracy  of 
the  proposed  framework  can  be  demonstrated  by 
projecting  roads  and  building  footprints  given  by  any  GIS 
resource.  In  this  paper  the  publicly  available 
OpenStreetMap  database  is  used  for  this  purpose.  Figure 
12a,b  show  projections  of  roads  onto  images  with  large 
bias  corrections.  Note  that  OpenStreetMap  accuracy 
depends  on  the  accuracy  of  the  GPS  equipment  used  by 
the  contributor  of  the  data  points  so  it  is  not  realistic  to 
expect  exact  alignment  of  these  features;  however,  the 
alignments  are  much  improved  when  the  bias  corrected 
cameras  are  used. 

For  the  building  projections,  the  height  maps  given  by 
the  3-D  models  are  used  with  the  bias  corrected  cameras. 
The  alignments  are  compared  to  the  projections  using 
ASTER  DEM  elevations  and  original  RPC  models.  It  can 
be  observed  from  Figure  12c  that  the  footprints  given  by 
bias  corrected  cameras  align  better  with  the  roof-tops  of 
the  corresponding  buildings  when  bias  corrected  RPC 
models  are  used.  It  should  also  be  observed  that  the 
projections  differ  more  significantly  as  the  building  height 
increases  since  better  elevation  estimates  are  given  by  the 
3-D  surface  model  as  compared  to  ASTER  DEM. 

Conclusions 

A  fully  automated  registration  framework  is  presented 
to  align  multiple  satellite  images  simultaneously  to  a 
common  absolute  coordinate  frame.  The  framework 
corrects  imagery  from  different  satellites  with  varying 
degrees  of  geo-positioning  errors.  The  absolute  accuracy 
of  the  final  alignment  is  bounded  by  the  geo-positioning 
errors  of  the  seed  group.  For  the  experiments  in  this  paper, 
Geo-Eye  1  images  are  selected  to  form  seed  groups, 
yielding  less  than  2.5  meter  absolute  error  which  is 
demonstrated  by  the  betterment  in  the  alignments  of  OSM 
roads  when  projected  onto  the  corrected  imagery.  The 


relative  accuracy  of  registration  to  a  common  coordinate 
system  is  proven  through  the  improvement  of  the 
triangulation  accuracy  of  scene  surfaces  as  part  of  a  3-D 
modeling  framework,  when  the  corrected  RPC  camera 
models  of  the  images  are  used. 
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